Improved word confidence estimation using long range features
نویسندگان
چکیده
This paper describes experiments in improving word confidence estimation using documentand task-level features of the hypothesized word sequence from a recognizer. The improved confidence estimates are shown to improve information extraction performance, specifically named entity (NE) recognition. The detected names can then be used to further improve confidence estimation in a multi-pass NE recognition framework.
منابع مشابه
Vocabulary-independent word confidence measure using subword features
This paper discusses how to compute word-level confidence measures based on sub-word features for large-vocabulary speaker-independent speech recognition. The performance of confidence measure using features at word, phone and senone level is experimentally studied. A framework of transformation function based system using sub-word features is proposed for high performance confidence estimation...
متن کاملError Detection for Statistical Machine Translation Using Linguistic Features
Automatic error detection is desired in the post-processing to improve machine translation quality. The previous work is largely based on confidence estimation using system-based features, such as word posterior probabilities calculated from N best lists or word lattices. We propose to incorporate two groups of linguistic features, which convey information from outside machine translation syste...
متن کاملUsing Sub-word-level Information for Confidence Estimation with Conditional Random Field Models
The task of word-level confidence estimation (CE) for automatic speech recognition (ASR) systems stands to benefit from the combination of suitably defined input features from multiple information sources. However, the information sources of interest may not necessarily operate at the same level of granularity as the underlying ASR system. The research described here builds on previous work on ...
متن کاملThe Cu-htk March 2000 Hub5e Transcription System
This paper describes the Cambridge University HTK (CU-HTK) system developed for the NIST March 2000 evaluation of English conversational telephone speech transcription (Hub5E). A range of new features have been added to the HTK system used in the 1998 Hub5 evaluation, and the changes taken together have resulted in an 11% relative decrease in word error rate on the 1998 evaluation test set. Maj...
متن کاملAn Open Source Toolkit for Word-level Confidence Estimation in Machine Translation
Recently, a growing need of Confidence Estimation (CE) for Statistical Machine Translation (SMT) systems in Computer Aided Translation (CAT), was observed. However, most of the CE toolkits are optimized for a single target language (mainly English) and, as far as we know, none of them are dedicated to this specific task and freely available. This paper presents an open-source toolkit for predic...
متن کامل